Grapheme-based Spoken Term Detection in the Meetings Domain

نویسندگان

  • Dong Wang
  • Joe Frankel
  • Simon King
چکیده

Information retrieval from spoken audio has attracted the attention of a number of research groups, in part driven by the recent NIST Spoken Term Detection (STD) evaluation. A common approach is to split the task into two stages. In the first, a large vocabulary continuous speech recognition (LVCSR) system is used to generate a word or phone lattice corresponding to the audio, and in the second, lattice search is used to determine likely occurrences of the search terms. Searching a word-based lattice works well for terms which occur in the LVCSR system's vocabulary. However, search terms naturally have a tendency toward proper nouns, which leads to higher out-of-vocabulary (OOV) rates than found in transcription tasks. A standard method for dealing with OOV terms is to generate a phone sequence corresponding to the terms, which may be then be searched for in a phone lattice. In this work, we propose using context-dependent graphemes (CDG) as sub-word units for spoken term detection, in particular for out-of-vocabulary search terms. In essence, this approach moves pronunciation modelling away from the letter-to-sound rules which are used to generate phone strings, and into the Gaus-sian mixture models which describe the observation space. This removes the need to make potentially error-prone hard decisions at an early stage of processing. In addition, words which have multiple pronunciations have a single grapheme representation which simplifies the subsequent search. Large text corpora can be used to train long-span grapheme-based language models for use in lattice generation. These language models have words implicit within them, though given suitable smoothing can be used to support previously unseen words. In this work, we first present the results of phone and grapheme recognition, in addition to word recognition based on phone and grapheme sub-word units. On the RT04s independent headset microphone (IHM) test condition, we find word error rate (WER) using phone sub-word units lower than that with graphemes, 44.5% compared to 54.5%. The phone error rate (PER) is 48.2%, slightly higher than the grapheme error rate (GER) of 46.3%, though these are not directly comparable as there are fewer graphemes than phones. We then present results on a spoken term detection (STD) task. Again using the RT04s test set, 78 in-vocabulary words and 64 out-of-vocabulary words were selected as search terms from the reference transcription. HTK was used to generate word or sub-word lattices, and a tool developed at Brno [1] used to …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comparison of grapheme and phoneme-based units for Spanish spoken term detection

The ever-increasing volume of audio data available online through the world wide web means that automatic methods for indexing and search are becoming essential. Hidden Markov model (HMM) keyword spotting and lattice search techniques are the two most common approaches used by such systems. In keyword spotting, models or templates are defined for each search term prior to accessing the speech a...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Stochastic Pronunciation Modelling for Out-of-Vocabulary Spoken Term Detection

Spoken term detection (STD) is the name given to the task of searching large amounts of audio for occurrences of spoken terms, which are typically single words or short phrases. One reason that STD is a hard task is that search terms tend to contain a disproportionate number of out-of-vocabulary (OOV) words. The most common approach to STD uses subword units. This, in conjunction with some meth...

متن کامل

Newborn EEG Seizure Detection Based on Interspike Space Distribution in the Time-Frequency Domain

This paper presents a new time-frequency based EEG seizure detection method. This method uses the distribution of interspike intervals as a criterion for discriminating between seizure and nonseizure activities. To detect spikes in the EEG, the signal is mapped into the time-frequency domain. The high instantaneous energy of spikes is reflected as a localized energy in time-frequency domain. Hi...

متن کامل

Fast decoding for open vocabulary spoken term detection

Information retrieval and spoken-term detection from audio such as broadcast news, telephone conversations, conference calls, and meetings are of great interest to the academic, government, and business communities. Motivated by the requirement for high-quality indexes, this study explores the effect of using both word and sub-word information to find in-vocabulary and OOV query terms. It also ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007